Dashboard

Row

Number of Sold Properties

106981

Median of Sale Price

212500

Town with Most Sold Properties

Bridgeport

Analysis

Getting the data: Exploring open government data and developing a model to predict sales price based on the assesed price and name of the town using historical records from the data.ct.gov open data portal.

          Name SerialNbr ListYear DateRecorded AssessedValue SalePrice
1       Wilton     11245     2011   2011-05-23        552090    591500
2     Guilford    110276     2011   2011-08-23        208800    265000
3   Bridgeport    110038     2011   2011-10-01       2182650     72500
4      Tolland    120001     2012   2011-10-01        205495    336000
5   Farmington    110001     2011   2011-10-02        254610    350000
6       Monroe     11002     2011   2011-10-02        251370    313700
7  Wallingford    110005     2011   2011-10-02        207600    280000
8   West Haven    110010     2011   2011-10-02        143500    154900
9      Ansonia       110     2011   2011-10-03        242500    225000
10     Ansonia       111     2011   2011-10-03        171500    225000
   AdditionalRemarks SalesRatio NonUseCode ResidentialType
1                  0  0.9333728         25               1
2                  0  0.7879245         14               1
3                 NA 30.1055172         25                
4                  0  0.6115923         28               1
5                 NA  0.7274571         NA               1
6                 NA  0.8013070         25               1
7                 NA  0.7414286         NA               1
8                 NA  0.9264041         14               1
9                 NA  1.0777778         24                
10                NA  0.7622222         24                
   ResidentialUnits                 Address
1                 1  25 BUCKINGHAM RIDGE RD
2                 1           5 ROSEMARY LN
3                 0     20 HADDON ST UNIT 2
4                 1 1235 TOLLAND STAGE ROAD
5                 1          42 HIGHWOOD RD
6                 1       45 TWIN BROOK TER
7                 1             5 LORI LANE
8                 1        44 PHILLIPS TERR
9                 0          47 PERSHING DR
10                0          51 PERSHING DR
                                                                        Location
1    25 BUCKINGHAM RIDGE RD\nWilton, CT\n(41.19885403200004, -73.41298888899996)
2           5 ROSEMARY LN\nGuilford, CT\n(41.27730713900007, -72.67815227299997)
3   20 HADDON ST UNIT 2\nBridgeport, CT\n(41.15726416200005, -73.22493095999994)
4  1235 TOLLAND STAGE ROAD\nTolland, CT\n(41.88591704400005, -72.33325819399994)
5        42 HIGHWOOD RD\nFarmington, CT\n(41.75635671400005, -72.86742348799999)
6        45 TWIN BROOK TER\nMonroe, CT\n(41.331306522000034, -73.21813342499996)
7          5 LORI LANE\nWallingford, CT\n(41.47644067200008, -72.84534640899994)
8     44 PHILLIPS TERR\nWest Haven, CT\n(41.272255536000046, -72.96408639599997)
9           47 PERSHING DR\nAnsonia, CT\n(41.33439924700008, -73.08361604599997)
10         51 PERSHING DR\nAnsonia, CT\n(41.334579002000055, -73.08357571599998)

Data Cleaning & Wrangling: The original data included punctuation for the sale and assesed price.

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
        0    125000    212500    400900    361500 152400000 
[1] 0.4199754

Exploratory Data Analysis: Discovering correlations between independent and dependent variables through graphic representation.

Feature Engineering

Modeling

Details

Summary of Data Source: The Office of Policy and Management maintains a listing of all real estate sales with a sales price of $2,000 or greater that occur between October 1 and September 30 of each year. For each sale record, the file includes: town, property address, date of sale, property type (residential, apartment, commercial, industrial or vacant land), sales price, and property assessment.

Source: https://data.ct.gov/Housing-and-Development/Real-Estate-Sales-By-Town-for-2011-2012-2013/8udc-aepg

---
title: "Connecticut Real Estate Sales from 2011-2013"
output: 
  flexdashboard::flex_dashboard:
    source_code: embed
    theme: cosmo
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(flexdashboard)
library(RSocrata)
library(DT)
library(ggplot2)
library(plotly)
library(dplyr)
```


Dashboard
========================================


```{r, initial_data, include=FALSE}
url = "https://data.ct.gov/Housing-and-Development/Real-Estate-Sales-By-Town-for-2011-2012-2013/8udc-aepg"
real_estate <- read.socrata(url)

# get some small results for dashboard boxes
tbl = as.data.frame(table(real_estate$Name))

tbl$Freq <- as.numeric(tbl$Freq)

mx <- as.character(tbl$Var1[which(tbl$Freq == max(tbl$Freq))])

```


Row
-----------------------------------------------------------------------

### Number of Sold Properties

```{r, vb1}
nums <- nrow(real_estate)
valueBox(nums, icon = "fa-home")
```


### Median of Sale Price
```{r, vb2}
med <- median(real_estate$SalePrice)
valueBox(med, icon = "fa-usd")
```


### Town with Most Sold Properties
```{r, vb3}
valueBox(mx, icon = "fa-road")

```


Analysis {.storyboard}
=========================================

### **Getting the data**: Exploring open government data and developing a model to predict **sales price** based on the assesed price and name of the town using historical records from the [data.ct.gov](data.ct.gov) open data portal.

```{r, dt, echo=FALSE}

head(real_estate, n=10)
```


### **Data Cleaning & Wrangling**: The original data included punctuation for the sale and assesed price.

```{r, wrangle}

# pre-processing - remove $ punctuation
real_estate$AssessedValue <- gsub("^[[:punct:]]", "", real_estate$AssessedValue)
real_estate$SalePrice <- gsub("^[[:punct:]]", "", real_estate$SalePrice)
# change columns AssessedValue and SalePrice from chr to numeric
real_estate$AssessedValue2 <- as.numeric(real_estate$AssessedValue)
real_estate$SalePrice2 <- as.numeric(real_estate$SalePrice)
# summary numbers of reponse
summary(real_estate$SalePrice2)
# correlations
cor(real_estate$SalePrice2, real_estate$AssessedValue2) # moderate-high value

```

### **Exploratory Data Analysis**: Discovering correlations between independent and dependent variables through graphic representation.

```{r, eda, echo=FALSE}
# scatterplot 
b_real_estate <- real_estate[, - 3] # create background data for plot

s <- ggplot(real_estate, aes(x = AssessedValue2, y = SalePrice2, fill=factor(ListYear))) + 
            geom_point(data=b_real_estate, fill = "grey", alpha=0.3) +
            geom_point() +
            facet_wrap(~ ListYear) +
            theme_bw() +
            labs(title = "Real Estate Prices by Listing Year", 
                 x = "Assesed Value ($)", y = "Sale Price ($)") +
            guides(fill = FALSE)

ggplotly(s)

```


### Feature Engineering


### Modeling


Details
=========================================

*Summary of Data Source:* The Office of Policy and Management maintains a listing of all real estate sales with a sales price of $2,000 or greater that occur between October 1 and September 30 of each year. For each sale record, the file includes: town, property address, date of sale, property type (residential, apartment, commercial, industrial or vacant land), sales price, and property assessment.

**Source**: [https://data.ct.gov/Housing-and-Development/Real-Estate-Sales-By-Town-for-2011-2012-2013/8udc-aepg](https://data.ct.gov/Housing-and-Development/Real-Estate-Sales-By-Town-for-2011-2012-2013/8udc-aepg)